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Abstract 

We deal with scheduling-type multi-agent projects, where the workflow of each task may 
contain chance events and decision opportunities. In this case, global efficiency requires each 
agent to make his decisions depending on all the current states of the work processes of all 
agents. But if the current state of each task is hidden then we also need to get the agents 
interested in sharing this information truthfully. In this paper, we design a mechanism that 
implements this cooperative and truthful behaviour of all agents. 

1 Introduction 

In this paper, we concentrate on stochastic multi-agent projects in which the utility of the overall 
result is an arbitrary function of the results of the tasks of the different agents. For example, when 
there are parallel tasks and we care only about the latest completion time, then the result of each 
task consists of its completion time and the utility is a function of the maximum of the completion 
times. Most scheduling problems are important cases in point, as well. 

The workflows of most real-life tasks contain chance events and decision opportunities. With 
such tasks, efficiency requires the agents to make the best decisions considering all earlier chance 
events of all agents. Continuing with the example of the parallel tasks, it may happen that an 
agent can choose between a faster and a cheaper way to continue, and if some of the other tasks 
are doing badly then he is required to choose the cheaper way, but if all are doing well then he is 
required to choose the faster way. 

The real difficulty arises from that the workflow, the chance events and these decisions are 
private information of the corresponding agent. Furthermore, for example, if an agent can choose 
between a faster and a slower way, which differ only in the cost and the probability distribution 
on the completion time, then no one else can ever verify which way the agent chose. So we want 
to make the agents interested in revealing all their private information, even though they can lie, 
and in making the globally optimal decisions. 

In our simplified model, there is a player called the principal^) and she can contract with 
some of the other players called agents(cf) to work with. Each agent can work according to his 
stochastic decision tree, which contains decision points and chance points as internal nodes, and a 
result and a real cost at each leaf. Costless certifiable communication is allowed throughout the 
process and contracts can be made between the principal and each agent that specify a payment 
between them depending on both the achieved result of the agent and the communication between 
them. At the end, the principal pays the money to each agent according to their contract, she gets 
the utility corresponding to the set of the achieved results of the agents, and each agent pays the 
cost of the achieved leaf of his tree. 



This way the decisions of an agent can depend on the earlier messages he gets from the principal, 
which can depend on the earlier messages the principal gets from other agents, which can depend 
on their chance events. So this method gives us the chance to make the agents interested in making 
globally optimal decisions, depending on the decision trees and the earlier chance events of other 
agents, too. This paper is about how this goal can be achieved. 

1.1 Related works 

Our model and (the second price) mechanism can be considered as a stochastic dynamic general- 
isation of the famous and widely used Vickrey-Clarke-Groves mechanism [5]. Although both its 
stochastic but not dynamic and its dynamic but not stochastic generalisations are simple conse- 
quences of it, our stochastic and dynamic generalisation of the model contains further substantial 
aspects. Namely, the actions of the agents are not directly observable, but only a dynamic combi- 
nation of the actions and chance events of each agent is observable, which opens the door for him 
to lie in much more complex and unobservable ways. Even so, efficiency requires here a high level 
of cooperation and we want the agents to be interested in it. 

Our model also generalises some recently studied models such as the two-stage mechanisms (see 
Ieong and Sundararajan (2007)pl)], Papakonstantinou, Rogers, Gerding, Jennings (2008) [14]), as 
well as numerous other problems, such as most scheduling-type problems, which are, would or 
would have modelled as special cases of our model. 

The revelation principle was introduced by Gibbard (1973) [7] and extended to the broader so- 
lution concept of Bayesian equilibrium by Dasgupta, Hammond and Maskin (1979) [5J, Holmstrom 
(1977) [9], and Myerson (1979) [13]. It tells us that any equilibrium of rational communication 
strategies for the agents can be simulated by an equivalent incentive-compatible direct-revelation 
mechanism, where a trustworthy mediator maximally centralizes communication and makes hon- 
esty and obedience rational equilibrium strategies for the agents. Accordingly, we will achieve our 
goal by a dynamic direct mechanism. 

Our basic solution concept can be considered as a stochastic dynamic version of the ex post 
Nash equilibrium. This concept was discussed as "uniform incentive compatibility" by Holmstrom 
and Myerson (1983) and it is increasingly studied in game theory (see Kalai (2002) [TT]) and is 
often used in mechanism design as a more robust solution concept (Cremer and McLean (1985) [3], 
Dasgupta and Maskin (2000) [4j, Perry and Reny (2002) [IS]). 

1.2 Example for the process 

We consider a project that is very risky, but may gain huge utility. It consists of two tasks, the 
principal should choose one application per task, and if both succeed in time then the principal 
gets a large sum of money, that is 60 here, but if either fails to do it then the success of the other 
task has no use. 

The first two trees are descriptions of stochastic decision trees in the following sense. The 
possible executions of a task are the paths from the root to a leaf, where the timing of the events 
is represented by their heights. The solid squares denote decision points, at which the agent can 
choose the branch to continue. The other internal nodes denote such chance points at which the 
branch to continue is chosen randomly with probability 1/2 for each branch. At each leaf, there 
is a tick denoting the success or a cross denoting the failure, and the number shows his total cost. 
(The numbers on the edges are only to show how these asked payments are calculated.) 
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Let us now consider two agents for the two different tasks. Each of them applies by sending 
his own decision tree but with a constant extra cost. Their applications are the first two trees 
in Figure 1. Specifically, the first agent asks 2 units of money beyond his expenses. His task is 
either started at the beginning with a cost of 5 and probability of the success of 1/2, or he makes 
preparations for a cost of 1, and at a later point in time, if the principal asks for it then he can try 
to complete the task for a cost of 6 with probability of the success of 1/2. In the other application, 
the cost plus the desired expected payoff of the other task is 7 and the probability of the success 
is 1/2. 

The principal, after receiving all applications, considers each pair for the two different tasks, 
evaluates that how much payoff could she have with them, and chooses the pair producing her 
the most payoff. The evaluation of the applications shown by the first two trees is described in 
the third tree. This tree is a "product" of the two trees; namely, it describes the possible overall 
executions of all tasks. We can construct it by following the applications by time, and creating an 
appropriate branching point if it occurs in one of them, and then continue this on both branches. 
For example, the path to the 5th leaf from the left describes the pair of the paths to the middle 
leaf of the first tree and the left leaf of the second tree. In details, the first agent chooses working 
by the second way, then the second agent fails to complete the task and then the first agent does 
not try to complete his task. At each leaf, there is a tick denoting the success of both tasks or a 
cross denoting the failure of either, and the number shows the basic total payment asked by the 
agents. 

State means the state of the project at a point in time; it can be represented by a point in the 
graph of the third tree. We define the values of all states from the bottom to the top, indicated 
by italics in Figure 1. Let the value of an endstate be 60 if both tasks succeed and otherwise, 
minus the total cost. Values of states in the same edge are the same. The value before a decision 
point is the maximum of the values after. The value before each chance point is the average of the 
values after. 

After the values are calculated, the project would be executed by the following way. At each 
decision point, the principal asks the corresponding agent to choose the branch with the highest 
value. At each chance point, the corresponding agent tells the chance event, and the principal pays 
him the signed difference between the values of the states after and before. At the end, the agents 
have to deliver the results and get the costs at the leaf. 

For the first agent, this means that the principal asks him - a bit surprisingly - to work in 
the second way, that is to make only preparations, and then the principal either asks him to do 
nothing and he gets 3, or she asks him to try to complete the task, and he gets 9 and ±30 for the 
risk, so he gets 39 if he succeeds, and he pays 21 if he fails. For the second agent this means that 
he gets 7 and ±12 for the risk, so if he succeeds then he gets 19, but if he fails then he pays 5. 
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(Note that these extremely high risks are more the peculiarity of this simple and risky example 
than of this process.) 

This way, the payoff of the principal is surely equal to the value of the starting state because at 
the end, she gets the value of the endstate, and whenever the value of the current state changes, 
she pays this difference. 

2 The basic model 

2.1 General notions 

For any symbol x, the definition of Xi will also define x as the vector of all meaningful Xi, and xs 
as the vector of for all i £ S. If x is a vector of elements and R is a vector of sets then x G R 
means Vi : Xi e Ri- x_j means the vector of all Xi except Xj, and (#__,■, y) means the vector x by 
exchanging Xj to y. 

The concept of actions of a player refers to what he effectively does, and when he does them. 
The concept of information of a player refers to what he knows and believes at a particular point 
in time. (And he always knows the current time.) The strategy set Str y of a player y is the set of 
all functions that assign a feasible action to each possible information. Therefore, a game can be 
defined by the set of the players, their action sets, information histories and payoffs. A subgame 
means the game from a point in time, given the earlier actions and random events. Formally, state 
is a synonym of subgame, but grammatically, state will refer mainly to the starting point in time; 
for example, we say that a state 7\ is later than another state T 2 if Ti is a subgame of T 2 . The 
dependence of any function or relation / on the game G is denoted by the form f G , but if G is the 
default game then we may omit this argument. 

For the sake of lucidity and gender neutrality, we use feminine or masculine pronouns depending 
on the gender ($ or d") assigned to the player. 

2.2 Interpretation 

There is a principal ($) who wants to get a particular project completed by some agents(cf). The 
principal can be considered as the designer of the mechanism. We consider her as a player but 
with a fixed and publicly known strategy, so we do not handle with her in the equilibrium concept. 

At the beginning, she can negotiate with the agents about the payment system, namely, how 
their payments depend on their achievements as well as the information they send and receive 
during the work. Then the agreed agents can start to work. 

Each agent can affect his own work, for example by using more or less workers; or being more 
hurry for some more fatigue, which is equivalent to some more cost. Furthermore, each agent gets 
feedbacks from his work, such as about unexpected failures, or simply faster or slower progresses. 
The dynamics of them is described by his decision tree, and each agent knows his own tree. 

The players cannot completely observe the executions of the other tasks; for example, it can 
happen that no one else can be sure that the completion time of a task of an agent was achieved 
by a fast and expensive work but with worse luck or by a slow and cheap work with better luck. 
However, some other aspects which directly affect the utility of the overall outcome, such as the 
completion time, are required to be certifiable. This requirement is handled by the concept of 
result. 
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The result can be interpreted as the certifiable aspects of the execution of one task, and the 
requirement is that these aspects of all players must determine the utility of the outcome of the 
overall execution for the principal. 

To put it the other way around, let us call some possible executions of a particular task equiv- 
alent if the utility is always the same with all executions, no matter what the executions of the 
others are. Then the results are the equivalence classes of the executions, and we require them to 
be certifiable. 

For the sake of generality, we allow the principal to also have a decision tree and we allow the 
agents to benefit from different results. Thus, the type of each player consists of a decision tree 
and a utility function. 

For the example (Section |l.2[), these concepts are the following. 





decision tree 


result 


utility function 


principal 


trivial 


trivial 


if both tasks succeed 
then 60, otherwise 


agent 1 


1st tree in Figure 1 


success or failure of task 1 


— (cost of his achieved leaf) 


agent 2 


2nd tree in Figure 1 


success or failure of task 2 


— (cost of his achieved leaf) 



For each player, we assume that he/she has some basic information, such as his own workflow, 
but we do not assume anything about how much he/she knows about the other things. We handle 
this by the following way. We define the model as essentially of perfect information, but we are 
looking for such a Nash equilibrium in which every player uses only his basic information. Because 
such an equilibrium implies that no matter how much information the players really have beyond 
their basic information, this strategy profile is surely Nash equilibrium. This can be considered as 
a stochastic dynamic version of ex-post Nash equilibrium. 

We do not require the players to know the decision trees and the utility functions of others; 
nevertheless, these will be defined as parameters of the game. The use of a common strategy set 
for the game with all kinds of parameters is to handle this problem. 



2.3 Model 



To understand the model, we strongly suggest reading the Interpretation (2.2). 

Our model is the following game, denoted by G = G(t, u). It is important that the strategy set 
of each player will not depend on the vectors t and u. 

Players. There are a player C called the principal^) and some players 1, 2, ...n called 
agents(c? ). Let Ag = {1, 2, ...n} and PI = Ag U {C}. Usually, i and j will refer to an agent. 

Each player y has a decision tree t y , which is a rooted branching tree structure consisting of 
the following terms. The internal nodes of the tree are of two kinds: the chance points and the 
decision points. To each chance point there is assigned a probability distribution on its children. 
There is a positive real point in time assigned to every internal node, which is later than the time 
of its parent (if exists). The third kind of nodes are the leaves. Each leaf / has a result r(l) 
assigned. For the sake of simplicity, we assume that all points in time of all trees are different. We 
denote the set of leaves in t y by L y and r will denote not a vector but a (multi)set. 

Executing t y means the following process. 

• We start from the root. 
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• If the current node is a decision point then, at its point in time, the agent moves to an 
arbitrary child of it. 

• If the current node is a chance point then, at its point in time, we move to a random child 
chosen with the probabilities assigned to the branches, independently of the agent's strictly 
earlier actions, of the not strictly earlier actions of all other players and of all other random 
choices. We call these choices as chance events. 

• If the current node is a leaf then the process ends. We denote this leaf by l y and the result 
of this leaf by r y . 

The action sets of the players are the following. 

• Each agent can send time-stamped instant messages to the principal at any time, and vica 
versa, but only a finite number in the aggregate. 

• At time —1, each agent decides whether he takes part in the game. 

• At time 0, the principal chooses a subset acc of the participating agents. We call them 
accepted agents, and the subgame after this choice is denoted by G ((t, u) acc ) = G acc . 

• After time 0, each chosen agent as well as the principal executes his/her decision tree. 

• At the end, the principal chooses a payment vector / : acc — > K. U {— oo}. 
The information of each player at a point in time consists of the following. 

• the decision trees of all players, 

• the utility functions of all players {sets of results} — > R, 

• the earlier actions of all players, 

• the earlier chance events of all players, 

• for an agent, his own chance event at the current time (if exists), 

• for the principal, the already achieved results. 
The payoff p of the players are defined as follows. 



For each player y, p(y) is a function of the strategies of all players, and of the chance events. 
p(y) restricted to a domain implied by a condition con is denoted by p(y)[con], for example, 
p(i)[i € acc] = Ui(li, r_i) + /j. Instead of p(y)[s = str], we will simply use p(y)[str]. We will denote 
this dependence only in cases of necessity. 




(1) 



(2) 
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2.4 Goal 



Denote the strategy of a player y by s y . The preference of an agent i, denoted by ^j, is a 
partial ordering on the strategy profiles of all players. 

Let E denote the expected value, which means (by default) the marginalisation over the chance 
events. Keep in mind that E(p(i)) is still a function of the strategies of the players. The default 
preference of each agent is to achieve higher expected payoff. Namely, s >zi s' iff E(p(i)[s}) > 



The basic information of each player means the part of his/her information containing, at a 
point in time, the following. 

• His/her own decision tree and utility function, 

• all messages he/she received strictly earlier, 

• for the principal, the agents' decisions about whether they stay in the game, provided that 
time > — 1 

• for an agent, whether the principal allowed him to work, provided that time > 

• his/her chance events up to and including the current time, 

• his/her earlier actions, 

• for the principal, the set of results r. 

Simple strategy of a player means such a strategy that assigns the action to his/her basic 
information. Notice that the simple strategy set does not depend on t and u. 

We call a strategy profile s of the players consisting of simple strategies an information- 
invariant Nash equilibrium, if each t, u, i G Ag and s' { G Stri satisfies s y9^ t,uS> (s_ i; s'j). With 
default preferences, this means E(p(i)[s]) > E(p(i)[(s-i, sQ]). 

Let P = accU{C} called participants. We call the sum of the payoffs of all players the payoff 
of the system, and we denote it by p(Pl). Clearly, 



E(p(i)[s>})- 




(3) 



Let the value of a game H be 



v(H) 



max E(p H (Pl)[s}). 



(4) 




(5) 



• s is an information-invariant Nash equilibrium. 



7 



2.5 Interpretation of dependent action sets 

In many cases like in scheduling problems, some actions of agents may be restricted by some 
actions of other agents. For example, one cannot start building the roof of a house before someone 
completes the walls. But the model deals only with independent working processes. To resolve this 
problem, we ignore all restrictions, hereby extending the action sets of the restricted agents, but we 
define uc = — oo at the extended overall executions. The point of this is that if a strategy profile 
satisfying the restrictions is an equilibrium in the extended model then it must be an equilibrium 
in the original model, as well. But to do that, as uc is defined not on overall executions but on 
sets of results, the restricting and the restricted events are always required to be certifiable; that 
is, whether something "impossible" happened must be a function of the results. This means for 
the example that each result of the agent building the walls must contain the completion time m c , 
each result of the agent building the roof must contain the starting time m s , and if m s < m c then 
uc = — oo. 

To sum up, this model can be applied also in such cases when one's possible decisions can be 
restricted by some certifiable events in other tasks. 

3 Mechanism 

Mechanism means a simple strategy of the principal. Her actions consist of 

• her choice of acc; 

• her communication; 

• the execution of her tree, 

• her choice of /. 

3.1 First price mechanism 

The principal assumes the agents to use the communication protocol below. (For example, she 
defines such an equivalent protocol that cannot be violated.) 

The principal requires the agents to send her a decision tree and a utility function (which 
are not necessarily their own ones) at time — 1. We call this message of % as his application 
appi = (t*,u*). Let us use the notion app c = (tc,u c ) and we identify app with (£*,«*). Then the 
principal considers the strategy profile s* of G(app) which maximizes E(p G ( app \Pl)), and 

• she accepts the agents who would be chosen agents by s*; 

• she makes the decisions in her own tree corresponding to s* and considers her own chance 
event in G(app) to be the same as her chance event in G. 

• whenever an agent % reaches a chance point in G(app), she requires him to send her a chance 
event, and she considers the event in this message to be the chance event in G(app); 

• whenever an agent i is about to reach a decision point in G(app), she sends him the decision 
corresponding to s*, and she considers this to be his decision in G(app); 
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• At the end, the principal achieves a leaf in G(app), which corresponds to one leaf I* per 
participant y. If for an agent i, ri ^ r*(l*) then /j = — oo; otherwise, denote the set of the 
chance events during the execution of t* in G(app) by Chi, and for each ch 6 Chi, denote 
the states just before and after ch by Tf h and T£ h . Let 

d(ch) =v{Tl h ) -v{Tl h ). (6) 

Then the principal chooses 

/, = -<(/*, r)+ £ d(cfc). (7) 



3.2 Second price mechanism 

We define the value of a set S of applications by v(S) = v(G(S)). The value of an applica- 
tion appi means 

v + (appi) = v(app) - v(app_i). (8) 

The second price mechanism is the same as the first price mechanism but the principal pays 
v + (appi) more to each agent with application appi, namely, 

/. = v+ (appi) - u*(l*,r) + d ( ch )- 

chEChj 



3.3 Further definitions and notions 

From now on, the mechanism will be fixed, the default mechanism will be the first price mechanism, 
and we will no longer denote the dependence of the strategy of the principal. In details, s will refer 
to SA g and p will always be restricted to the first price mechanism. We denote the payoffs restricted 
to the second price mechanism by p^. Furthermore, we say that a mechanism sc information- 
invariantly Nash implements a goal if there exists a strategy profile s = SA g such that (s, sc) is an 
information-invariant Nash equilibrium and the goal will be achieved using this strategy profile. 
From now on, we assume that all agents use such strategy by which r(l*) ^ cannot happen. 



(It is only a technical assumption, see Section 9.1 ) 



We call the state of G(app) induced by the messages the principal sent and received until a 
point in time as the presumed state. We will use this word in analogous cases, too. 

The mechanism can also be defined in subgames of G by making the further actions corre- 
sponding to the optimal strategy profile in the presumed subgame. 

We define the cost price strategy cpi of an agent i as follows. He takes part in the game 
and sends the cost price application (ti,Ui) to the principal. Then, if the principal chooses him 
to work then he always makes the decision which she asks him for, and at each chance point, he 
sends her the chance event that occured. 



4 Efficiency and interests in the second price mechanism 

Lemma 1. Denote the state just before and all possible states just after an action of a player by 
T and T\,T2, ...Tk- Then 

v(T) = maxt>(Tj) (9) 
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Proof. 



v(T)@ max E(p T (Pl)[str}) = max max E(p Tl (P/)[str]) § maxu(Ti) □ 

str£Str T i str£Str T i » 



Lemma 2. Denote the state just before and all possible states just after a chance point of an 
agent by T and T\,T^ ...7V Let wi, u>2, ...«;& denote the corresponding probabilities. Then v(T) = 
Y^Wi ■ v(Ti), or equivalently, 

Y,w i (v(T i )-v(T)) = 0. (10) 

Proof. 

u(T)§ max P(p T (PZ)[str]) = Vw; ■ max P(p Tl (PZ)[^r]) § V w, t ■ viTA □ 

str£Str T streStr T i 

Lemma 3. If the chance points of a player y correspond to his/her chance points in G(app) with 
the same probabilities then 

E( J2 d W) = 0- (11) 

chaChy 

Proof. Using the notions in Lemmaj^J E{d{ch)) — Y Wi(v(Ti) — v (T)) ® 0, so Y d(ch), summing 
on all past chance events ch of y, is a martingale, so the expected values of the sums at the end 
(left side) and at the beginning (right side) are the same. □ 

Lemma 4. 

E{p(C)) = v(app) (12) 

Proof. Denote the actual presumed state by T, the achieved presumed endstate by iV and denote 
the sequence of the presumed chance events of y until T by Ch y (T), and let Ch(T) = \jCh y {T). 
Then v(T) — Yl d(ch) is invariant during the game because of ^ and ([9]), so 

ch&Ch(T) 



v (app) = v(G(app)) = v(G(app)) - d ( ch ) = V ( N ) - E d ^ E E u K l l 

ch£Ch(G(app)) cheCh(N) yeP 



"EE d{ch))®E{u* c {l* c ,r)- J2 d(ch)-Y,fi)^E(uc(lc,r)-"£fi)^E(p(C)). □ 

y£P chGChy ch£Chc i£acc i£acc 

Lemma 5. 

E(p(i)[ Si = cpi}) = (13) 
Proof. If i ^ acc then p(i) =0 0, and if i G acc then 

E(p(i))^E(u i (l h r)+f i )^E(u i (l i ,r)-u*(l*,r)+ ^ d(ch))^ E( Ui (k,r) - u*(l*,r)) = □ 

Because of ([5]), the following theorem shows the efficiency of the cost price strategy profile, in 
accord with the revelation principle. 

Theorem 6. 

E(p(Pl)[cp])=v(G). (14) 
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Proof 1. If s = cp then app = (t, u) and 



E(p(Pl)) = E(p(C)) + 52E(p(i))^v(G(app)) = (G(t,u)) = v(G). □ 

ieP 

Proof 2. The presumed execution maximizes the presumed E(p(Pl)) in the presumed game G(app). 
If s = cp then G(t, u) = G(app) and the actual and the presumed states will always be the same 
throughout the game. Consequently, 

E{p G {Pl)[cp}) = E(p G{app) (Pl)[s*]) = max E(p G(app) (Pl)[str]) i v(G(app)) = v{G). □ 

str£Str G( . a PP'> 

Theorem 7. In the second price mechanism, the cost price strategy profile is an information- 
invariant Nash equilibrium. 

Proof. The outline of the proof is as follows. We consider an arbitrary agent j and assume that all 
other agents use cost price strategy. Then we will prove that E(p 2 (j)) < v(G) — v((t,u)_j), and 
equality holds if j also uses cost price strategy. As the right side is independent of the strategy of 
j, this will prove the inequality. 

Consider a mixed mechanism which is second price for j and first price for the other agents, 
namely 

• fj = 9+{apPj) - u*{l*, r) + Y, d ( ch )> 

ch&Chj 

. /« = -<(/?, r)+ E d{ch),ifi^j. 

ch^Chi 

Clearly, the payoff and the preference of j and of the system are the same here as in the second 
price mechanism. Let p* denote the payoffs here. Whether j is accepted or not, 

E(p*(C)) = E{p{C)) —v + (appj) GUP v(app) — (v(app) —v(app-j)) = v(app^j) = v((t,u)-j) (15) 

Using that for all i ^ j, E{p*{i)) = E{p{i)) ^ 0, 



E(p 2 (j)) = E(p*(j)) = E(p{Pl)) - E(p*(C)) - Yl E (P^ S - «((f,u)-i), 



and ( 14 ) shows that equality holds if j also uses cost price strategy. □ 

Corollary 8. To summarize Theorems^ and\7^ the second price mechanism information-invariantly 
Nash implements v(G) = E(p(Pl)). 



5 Efficiency and interests in the first price mechanism 

For an application apt = (tr, ut), let apl + x = (tr, ut + x), and for a strategy str with application 
apl, let str + x mean the same strategy as str but with application apl + x. We call a strategy 
cpi + Xi and an application £j + x as fair strategy and fair application with profit Xi. Notice 
that if Si = cpi + Xi then 

«*(/*, r) = Ui(k, r) + Xi (16) 

A cost price strategy but applying with an arbitrary u* is called a fair-like strategy. Clearly, 
the cost price strategy is fair, and all fair strategies are fair-like strategies, as well. 
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Lemma 9. 

E(p(i)[i G acc; Sj = cpi — x]) — x (17) 



Proof. 

E{p{i)) i E( Ui (k, r) + /,) i r) - <(/*, r) + £ d(cfc)) O E( Ui (h, r) - <(/*, r)) ® 

Theorem 10. In G a cc; i/Vz G acc : s« = qoj — X, then 

E(p(Pl))=v(G acc ), 



Proof 1. Notice that Lemma [4] works in G acc instead of G, too, namely, p Gacc (C) = v(Go(app acc ))- 
Notice also that the only difference between G acc = Go((t,u) aC c) and Go(app acc ) is that each agent 
i gets Xi less payoff in the letter case. Hence, 

p G —(C) = v(G (app acc )) = v(G acc ) - x » ( 19 ) 

idacc 

thus in Gacc, 

E{p{Pl)) = E{p{C)) + E (p(i)) ^«G acc ) - *i) + J2 X * = < G ^- D 

igacc i£acc i&acc 

Proof 2. The presumed execution maximizes the presumed E(p(Pl)) in the presumed subgame 
Go(app acc ). If Wi G acc : Sj = cpj — Xi then on one hand, the only difference between G acc and 
G (app acc ) is that each agent % gets less payoff in the letter case; and on the other hand, this 
remains the only difference between the actual and the presumed states throughout the game. 
Consequently, 

E(p G -(Pl)[(c Pi - Xi ) ieacc ]) = E{p G ^-\Pl)[s*}) + £ 



^ i 
£acc 



max E(p Go{apPacc \Pl)[str}) + V x t = max E(p Gacc (Pl)[str}) § t»(G acc ). □ 

str&Str G o( a PPacc) ^ — ' str£Str G acc 

i£acc 

Let the signed value of an application appi be 

v±(appi) = max v(G (apj) S )) - v(app^) (20) 

Lemma [l] shows that the principal chooses the set acc C Ag for which v (Go(app acc )) is maximal. 
Hence, 

viapp) = v(G(app)) — max v(Go(apps)) = max( max f (Go(apps)), max t>(G (apps))) 

— max(w(app_j), max t>(G (app S '))) 
and the second term is the greater one iff i G acc. Hence, 

v + (appi) — v(app) — v(app-i) = max(t>(app_j), max t> (Go(apps))) — v(app-i) 

iGScAg 
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— max(0, v±(appi)) (21) 

and v±(appi) > iff i G acc. 

For any £ 3 i, u(G ((<Ws-{»}, - x))) = v{GQ{app s )) - x, so 

v±(appi — x) = v±(appi) — x. 

Thus, for a particular agent, there is exactly one fair application with a given signed value. 

Theorem 11. For an arbitrary agent j and value x, if every other agent is fair-like then the fair 
strategy of j with signed value x gains him the highest expected payoff among all those strategies 
that use an application of this signed value. 

Proof. Given the application of an agent i with fair-like strategy, what £, is makes no difference 
on the payoffs of the other players, nor on their basic information. Thus, given the strategies of 
the other players, the payoff of j is independent of t-j. Thus, we only need to and will prove the 
statement in the game G' = G((app_j, (tj,Uj))) with s_j = cp-j- 
If x < then j will be rejected, so p(j) = 0. Otherwise, 

E(p(j)) = E{p{Pl)) - E(p(C)) - ^ V ( G ') ~ v ( a PP) ®^v(G') -v{ap P - j ) - x 

and Theorem [6] shows that equality holds if Sj = cpj. □ 

First we describe our results more intuitively, then we present the formal results in the subsec- 
tions. 

Let the strategy form of a strategy str mean Form(str) = {str + x\x G R}. Let Form(cpi) = 
{cpi + x\x G M.} called the fair strategy form. Let p[str] = E(p(j)[sj = str}). 

With a strategy strj with application aplj, if both applications aplj and aplj — x were accepted 
then p[strj — x] — p[strj] + x (provided that the strategies of the other agents are independent of 
j's choice of x) and v(aplj — x) = v(aplj) — x, so p[strj] + v(aplj) depends only on Form(strj). 
We call this sum as the value of the strategy form v(Form(strj)). 

In each strategy form there exists a single strategy strj, using application aplj, for which aplj+x 
would be accepted if x > and rejected if x < 0. Then p[strj+x] = {0 if x < 0; and v(Form(sj)) — 
x if x > 0}. 

p[sj] < v(Form(sj)), so the strategy form with the highest value gains him the highest potential 
for his expected payoff. Another question is that expectedly how much he could exploit this 
potential, in the Bayesian meaning. 



Theorem 11 implies that the fair strategy form has the highest value. Moreover, p[cpj — 
x] = {x if v(cpj) > x; and otherwise}, which is a quite efficient way for the exploitation of this 
potential. 



5.1 Interest in cost price strategy with perfect competition 

The following definition of the perfect competition will roughly describe the limit of the cases when 
the maximum expected payoff that an agent is able to achieve tends to 0. Keep in mind that the 
principal accepts the applications with the positive (or nonnegative) signed values, and notice that 
the weaker (the partial ordering of) the preference the stronger the equilibrium. 

We define the perfect competition in G by the following preferences of the agents. Each 
agent i prefers the strategy profile by which E(p(i)) is nonnegative. Of those, he prefers the 
greater v±(appi). 
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Theorem 12. In first price mechanism with perfect competition, cost price strategy of all agents 
is an information-invariant Nash equilibrium. 

Proof. Assume that every agent except j uses cost price strategy. What we have to prove is that 
j is interested in using cost price strategy, as well. If p(j) > then 

v±(appj) ^ v + (appj) — v(app) — v(app-j) ®p(C) — v(app^j) = 
E(p(Pl)) - ^ P(») - v(app_j) ® E(p(Pl)) - p(j) - v(app_j) < v(G) - v(app_j), 

i£Ag 

and eqrefpsvg shows that equality holds if j also uses cost price strategy. So this inequality gives 
an upper bound for his preference, which can always be achieved by cost price strategy. □ 

Corollary 13. To summarize Theorems^ and 1_2, the first price mechanism information-invariantly 
Nash implements E(p(Pl)) = v(G) with perfect competition. 



5.2 First price mechanism with imperfect competition 

In this subsection, we consider t and u not nondeterministic but stochastic variables, and we 
assume the information of each agent to be between his basic information and his information in 
the original model. A bit more precisely, u, t and the information of the players are chosen from 
a previously fixed appropriate common distribution. (We do not define it precisely here.) Let Pi 
and Ei denote the probability and expectation (with respect not only to the chance events) given 
the information of i. 

Let us fix an agent i and s_j. Consider a strategy str and denote by apk (as a stochastic 
variable) the application he would send using strv Let (value) V = v±(cpi), (difference) D = 
V — v±(apU) = v(cpi) — v(apli) and e = Ei(D\D < V). 

In practice, V and D are almost independent and both have "natural" distributions, so Pj(e < 
V) = P(Ei(D\D < V) < V) is usually not smaller than Pi(D < V). This observation shows the 
importance of the following theorem. 

Theorem 14. If the other agents use fair strategy and Pi(Ei(D\D < V) < V) > Pi(D < V) holds 
then i prefers using a fair strategy, as well. 

Proof. Let g(stri) = Ei(p(i)[stri\) and x be the number by which g(cpi +x) is the largest possible. 
So we want to prove that g(cpi + x) > g(stri) for all stri. 

Theoretically, let us allow for i to submit an application fair(appi) in which he submits the fair 
application with the signed value v±(apli). Denote the fair strategy but with application fair(apli) 
by F(aph). 

appt is accepted iff v±(apk) > 0, or equivalently, D < V . By the equation 

g(stri) = Pi{i G acc) ■ Ei(p(i)\stri,i 6 acc), 

we get g(F e ) = Pj(e < V)e, and g(F(apU)) = Pi(D < V)e, whence we can simply get that 

g(F x ) - g( Si ) = (Pi(e <V)-P i {D< V))e + {g(F(apk)) - g(s t )) + (g(F x ) - g(F e )). 

• If e < then g(si) < = g(Fo) < g(F x ) so Si cannot be a better strategy. Thus we assume 
that e > 0. In this case, (-P«(e < V) — Pi(D < V))e > 0. If Sj is fair then D is constant, so 
both sides are the same. 
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• Proposition 10 implies that g(F(apli)) — g(stri) > 0. 

• g(F x ) — g(F e ) > by the definition of x. 

To sum up, g(F x ) — g{sj) > 0, which proves the equilibrium. □ 

6 Advantages of the first price mechanism over the 
second price mechanism 

6.1 Coalitions 

In this section, we consider the case when we have disjoint sets of agents, called coalitions, and 
each agent in a coalition prefers the higher total expected payoff of all in his coalition. 

First price mechanism. Submitting more applications by the same agent is equivalent to sub- 
mitting a joint application, consisting of a tree starting with a decision point and continuing with 
all "products" (see Figure 1) of different subsets of the original applications, and a kind of sum 
of the utility functions. Thus, if a coalition played as one agent, called consortium, with their 
combined decision tree, total utility function and their overall information, then this consortium 
would be able to simulate the case when they played as different agents in a coalition. So each 
coalition would not come off badly by forming a consortium, by which we get the original game 
with this new set of agents. 

As a consequence, if the competition remains perfect with this new set of players then the 
mechanism remains efficient. 

Second price mechanism. Consider the case when two agents can submit such applications that 
are useless without each other. Assume that their cost price applications would be accepted. If 
either of them decreased the cost of his application by x then the surplus value of both applications 
would increase by x, so totally they get 2x compensation, which means x more total payoff for 
them. Using this trick, these players can get as much payoff as they want. 



6.2 Reliance in the principal 

The mechansim can be modified as follows. The players must use the communication protocol 
defined at the mechanism, but right before each chance point of each agent, the principal sends 
d(ch) for each possible chance event ch (with J2 w ( c h)d(ch) = 0). 

First price mechanism. This makes /, to be a function of and the communication between i 
and C, and Lemmata [9] and [5] remain true independently of the strategy of the principal. 

Theorem 15. In the modified model, if all agents use fair-type strategy then the principal is 
interested in choosing her first price mechanism. 

Proof. With the corresponding modifications on the decision trees, she can consider the agents to 
use cost price strategies. Then 

E(p(C)) = E{p G -{C))^ E(p G -(Pl))fv(G acc ), 
and Theorem [6] shows that equality holds if the principal uses her declared strategy. □ 
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Second price mechanism. The main problem is how to verify that the principal did not de- 
termine any of the v + (appi) lower. If the principal seales a t* and u* right before recieving the 
applications, and at latest at the end, she makes them and all such applications public and verifi- 
able that was accepted or would have been accepted without one other real application, and sends 
each d(ch) right before each chance point, then the agents can rely on her, but it does not seem 
to be able to make reliance without making much less information public. 



7 Conclusion 

In the second price mechanism, the cost price strategy profile is an information-invariant Nash 
equilibrium, and this way the expected payoff of the system is the largest possible, including the 
possibilities of making one's decisions depending on the earlier chance events of others. 

In the first price mechanism, with some assumption (or approximation), there is a Nash equi- 
librium consisting of the same strategies but with asking a constant more money - his expected 
payoff in the case of acceptance - in each application. 

The disadvantage of the first price mechanism over the second price one is that it requires a 
competition on each task, and it is fully efficient only with perfect competition. 

The advantages of the first price mechanisms are 

• forming cartels of agents does not worsen the efficiency as long as it does not decrease the 
competition, while in the second price mechanism, if two agents can submit such applications 
that are useless without each other then they may get as much payoff as they want; 

• the agents need (almost) no reliance in the principal, even if they never get to know the 
applications of the others. 



8 Further theoretical observations and special cases 
8.1 An alternative approach of the mechanisms 

As an alternative approach, we define the first price mechanism with the following differences. We 
define the contract as a function that declares the payment between two players depending on the 
results of all players and the communication between the two players, and an application is an 
offer for contract. After principal receives all applications, she chooses which offers to accept. She 
uses the strategy by which her minimum possible expected payoff is the largest possible, where 
expectation is with respect only to her own chance events. So she is absolutely mistrustful meaning 
that she always expects to the worst possible joint behaviour of all agents. 

The alternative second price mechanism is the same as the first price one but at the end, 
the principal pays v + (appi) more money to each agent i beyond the payment according to their 
contract. 

With this approach, a fair application is the offer for using the same communication protocol as 
the protocol defined by the original fair application in the original mechanism, but he requires the 
principal to send the assignments as in Section 6.2 , and the agent i asks for — u* (I* , r) + d* (ch) , 



chaChi 



where d* denotes the assignment of the principal. 
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This approach requires slightly different proofs, but of course, our main results remain true. 
From now on, we will use this approach. We note that each application in the original model can 
be matched with an equivalent application in this model, but this is not true in the other direction. 

8.2 Simplifications and the case with no parallel tasks 

The messages the principal sends depend only on the earlier messages she got. Thus, if an agent % 
is sure that the principal receives no message from anyone else in the time interval I = [a, b] then, 
without decreasing the value of his application, he can ask the principal (in the application) to 
send to him at a that what messages she would send to him during / depending on the messages 
she would have received from him before. Similarly, if i is sure that the principal does not send any 
message to anyone else during I then he can send all his messages only until b, and during J, and 
the principal should send to him that which messages she would send depending on the messages 
she would have got before. 

As a consequence, consider the following project. It consists of two tasks, and the second task 
can only be started after the first one accomplished. The result of each agent for the first task 
(called first agent) consists of his completion time C\. The result of each second agent consists of 
his starting time Si and the time C 2 he completes; and his decision tree starts with doing nothing 
until an optional point in time S2, and then he can start his work. The utility function is of the 
form u'{C 2 ) for some decreasing function v! : {time} — > {IR} if C\ > S 2 , and —00 otherwise. In 
this case, the principal always communicates only with the agent who is just working at the time. 
So using the above observation, we can make simplified applications of the following form with the 
same values as of the fair applications. 

In short, the first agents tell that for how much money they would complete the first task 
depending on the penalty. Then the applied penalty for the chosen second agent is the loss form 
the delayed completion, and the penalty for the first agent is how much more the second agent 
asks if he can start later. The principal chooses the pair by which she gains the most payoff. 

In detail, the form of the application of the first agents is "We ask h{C\) — gi(h) money for any 
h : {time} — > {IR} chosen by the principal at the beginning", and for the second agents this is "We 
ask 1/(62) — g2(S 2 ) money if we can start our work at S 2 and we complete it at C 2 . h(Ci) and 
u'{C 2 ) describe the penalties here. In the simplified fair applications, g 1 and g 2 are chosen in such 
a way that make their expected payoff independent of the arguments, if the agents use their best 
strategies afterwards. 

If all applications are so then the principal chooses a pair for which gi(g 2 ) is the greatest. Then 
she chooses h = g 2 for the first agent, and this way the principal gets f(C 2 ) — (g 2 (C 2 ) — gi(g 2 )) — 
U'(C 2 ) -g 2 (S 2 )) = gi{g 2 ) payoff. 

If a first agent has no choice in his decision tree, that is, his completion time C\ is simply a 
probabilistic variable, then he should choose g\{h) = E(h{Ci)) — c, where c is his costs plus his 
profit. 

8.3 Controlling and controlled players 

For an example, consider a task of building a unit of railroad. An agent i can make this task for 
a cost of 100, but with 1% probability of failure, which would cause a huge loss 10,000. Another 
agent j could inspect and in the case of failure, correct the work of i under the following conditions. 
The inspection costs 1. If the task was correct then he does nothing else. If not, he detects and 
correct the failure with 99% probability for a further cost 100, but he does not detect, so he does 
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nothing with 1% probability. If both of them use cost price strategy and they are the accepted 
agents for the task then the mechanism works in the following way. 

At the end, i gets 101.99 but pays 199 (totally he pays 97.01) compensation if he fails, j gets 
1 if he correctly finds the task to be correct, he gets 200 if the task was wrong but he corrects it, 
but he pays 9800 if he misses correcting it. 

Of course, with fair applications each of them gets his profit more payment. It can be checked 
that the expected payoff of each agent is his profit independently of the behaviour of the others, 
and the payoff of the principal is fixed. 

8.4 Offer for cooperation 

Consider the case when a player a simply wants to make an offer for cooperation with another 
player c, which c will either accept or reject. This situation is a special case of the first price 
mechanism, where a is the only agent and c is the principal, and the offer of a is his application. 



9 Observations for application 

9.1 Modifications during the process 

In practice, the decision trees can be extremely difficult, thus submitting the precise fair applica- 
tions is not expectable. Therefore, they can present it only in a simplified, approximating way. 
Generally, such inaccuracies do not significantly worsen the optimality; nevertheless, this loss can 
be much more reduced by the following observation. 

Assume that someone whose application has been accepted can refine his decision tree during 
the process. It would be beneficial to allow him to carry out such modifications. The question is: 
on what conditions? 

The answer is for us to allow him to modify the subtree of his application from the presumed 
state, if he pays the difference between the values of the presumed states with the original and the 
new applications. Or equivalently, the costs of the leaves of the subtree automatically decrease by 
this difference. It is easy to see that whether and how to modify is the same question as which 



application to submit among the applications with a given value. Consequently, Theorem 11 shows 
that exchanging to the agent's true fair application is in his interest. 

Equivalently, each possible modification can be handled so as a chance point in the original 
application with 1 and probabilities for continuing with the original and the modified application, 
respectively. Because at such chance point, the principal assigns to the branch of not modifying 
and she assigns the difference between the values of the states after and before, to the modification. 

It may happen that in the beginning it is too costly for some agent to explore the many 
improbable branches of his decision tree, especially if he does not yet know whether his application 
will be accepted; but later however, it would be worth exploring better the ones that became 
probable. This kind of in-process modifications is what we want to make possible. We show that 
the interest of each agent in better scheduling of these modifications is about the same as the 
interest of the system in it. 

The expected payoff of an agent with an accepted fair application is fixed and for a nearly fair 
agent, the small modifications of the other applications have negligible effect. As the modifications 
of each agent have no influence on the payoff of the principal and only this negligible influence on 
the expected payoff of other agents, the change of the expected payoff of the system is essentially 
the same as the change of the expected payoff of this agent. This confirms the above statement. 
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On the other hand, it is clear that if the utility function alters somewhat then everything can 
be rescheduled according to the new goals. Moreover, the principal is interested in describing her 
utility function in the same schedule as which is the best according to the preference of the system. 

9.2 Risk-averse agents 

Assume that an agent % has a strictly monotone worth funcion hi : R — > R and he prefers the 
higher E(hi(p(i))). 

We define an application reasonable as the same as the fair application with the only difference 
that the agent asks for 

K \h l (-u*(l*,r))+ £ d *( ch ))> 

chaChi 

where d* denotes the value assigned previously to the chance event by the principal. By a reasonable 
application, in the case of acceptance, the expected worth of the utility of the agent is independent 
of the choices of the principal. If all applications are reasonable then the payoff of the principal 
remains fixed. If the agent is risk-neutral then the reasonable application is fair. These are some 
reasons why reasonable applications work "quite good". We do not state that it is optimal in any 
sense, but a reasonable application may be better than a fair application in the risk-averse case. 

We note that the evaluation of reasonable applications can be much more difficult than of fair 
applications, but if for each agent i, hi(x) — a^ — bi- e~ Xx then a similar recursion works as in ([I]) 
and @. 

9.3 Necessity of being informed about the own process 

We assumed that none of the chosen players knew better anything about any chance event of 
any other chosen agent. We show here an example that fails this requirement and it makes the 
mechanism wrong. Consider two agents a and b that will surely be accepted. Assume that a 
believes the probability of an unfavourable event in his work to be 50 %, but another one, called 
B knows that the probability is 60%, he knows the estimation of a and he also knows that at a 
particular decision point of him, he will be asked for the decision corresponding to this chance 
event. It can be checked that if the application of a is fair then if b increases the asked payment 
in his application of the more probable case by an amount of money and decreases it in the other 
case by the same amount then the value of his application remains the same but this way, he bets 
1 : 1 with b on an event of 60% probability. 

In order to limit such losses, a could rightfully say that larger bet can only be increased on 
worse conditions. Submitting reasonable application with concave worth function makes something 
similar, which is another reason to use this. 

9.4 Agents with limited funds 

This subsection is only a suggestion for the cases with such agents, and it is not optimal in any 
sense. 

Our mechanism requires each agent to be able to pay so much money as the maximum possible 
damage he could have caused. But in many cases, there may be a plenty of agents who cannot 
satisfy this requirement. However, accepting such agent a may be a good decision, if a is reliable 
to some degree. 
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To solve this problem, a should find someone who has enough funds, and who takes the respon- 
sibility, for example for an appropriate fee. If the agent is reliable to some degree then he should be 
able to find such insurer player b. (It can be even the principal, but considered as another player.) 
This method may also be used when a has enough funds, but he is very risk-averse. 



Here, a and b work similarly as a controlled and a controlling parties in Section 8.3 The 



difference is that b does not work here, and he knows the probability distribution of the result of 
a not from his own decision tree but from his knowledge about the reliability of a. Furthermore, 
the role of b here can be combined with his role in Section 18.31 



9.5 Communication 

Under non-ideal circumstances, the messages the players send should be secret. However, the 
model requires the communication to be certifiable. Both requirements can simply be satisfied 
using a cryptographic communication protocol. 



9.6 Applications in genetic programming 

This second price mechanism may also be useful in genetic programming, when we want to find an 
efficient algorithm for such a problem that can be distributed into slightly dependent subproblems, 
while the subprograms for these subproblems should cooperate, and their overall achievement is 
what we can evaluate. In this case we should experiment such subprograms simultaneously that 
also submits applications, and we simulate here a competitive market with a principal that uses 
our second price mechanism. 

If the form of all possible cost price applications for each subproblem is simple enough then we 
may need to experiment less parameters in each subproblems than in the original problem, and 
that is why this method converges faster. 
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